183 research outputs found

    On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems

    Full text link
    Reinforcement learning serves as a potent tool for modeling dynamic user interests within recommender systems, garnering increasing research attention of late. However, a significant drawback persists: its poor data efficiency, stemming from its interactive nature. The training of reinforcement learning-based recommender systems demands expensive online interactions to amass adequate trajectories, essential for agents to learn user preferences. This inefficiency renders reinforcement learning-based recommender systems a formidable undertaking, necessitating the exploration of potential solutions. Recent strides in offline reinforcement learning present a new perspective. Offline reinforcement learning empowers agents to glean insights from offline datasets and deploy learned policies in online settings. Given that recommender systems possess extensive offline datasets, the framework of offline reinforcement learning aligns seamlessly. Despite being a burgeoning field, works centered on recommender systems utilizing offline reinforcement learning remain limited. This survey aims to introduce and delve into offline reinforcement learning within recommender systems, offering an inclusive review of existing literature in this domain. Furthermore, we strive to underscore prevalent challenges, opportunities, and future pathways, poised to propel research in this evolving field.Comment: under revie

    Intrinsically Motivated Reinforcement Learning based Recommendation with Counterfactual Data Augmentation

    Full text link
    Deep reinforcement learning (DRL) has been proven its efficiency in capturing users' dynamic interests in recent literature. However, training a DRL agent is challenging, because of the sparse environment in recommender systems (RS), DRL agents could spend times either exploring informative user-item interaction trajectories or using existing trajectories for policy learning. It is also known as the exploration and exploitation trade-off which affects the recommendation performance significantly when the environment is sparse. It is more challenging to balance the exploration and exploitation in DRL RS where RS agent need to deeply explore the informative trajectories and exploit them efficiently in the context of recommender systems. As a step to address this issue, We design a novel intrinsically ,otivated reinforcement learning method to increase the capability of exploring informative interaction trajectories in the sparse environment, which are further enriched via a counterfactual augmentation strategy for more efficient exploitation. The extensive experiments on six offline datasets and three online simulation platforms demonstrate the superiority of our model to a set of existing state-of-the-art methods

    Contrastive Counterfactual Learning for Causality-aware Interpretable Recommender Systems

    Full text link
    There has been a recent surge in the study of generating recommendations within the framework of causal inference, with the recommendation being treated as a treatment. This approach enhances our understanding of how recommendations influence user behaviour and allows for identification of the factors that contribute to this impact. Many researchers in the field of causal inference for recommender systems have focused on using propensity scores, which can reduce bias but may also introduce additional variance. Other studies have proposed the use of unbiased data from randomized controlled trials, though this approach requires certain assumptions that may be difficult to satisfy in practice. In this paper, we first explore the causality-aware interpretation of recommendations and show that the underlying exposure mechanism can bias the maximum likelihood estimation (MLE) of observational feedback. Given that confounders may be inaccessible for measurement, we propose using contrastive SSL to reduce exposure bias, specifically through the use of inverse propensity scores and the expansion of the positive sample set. Based on theoretical findings, we introduce a new contrastive counterfactual learning method (CCL) that integrates three novel positive sampling strategies based on estimated exposure probability or random counterfactual samples. Through extensive experiments on two real-world datasets, we demonstrate that our CCL outperforms the state-of-the-art methods.Comment: conferenc
    • …
    corecore